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DETAILED ACTION 

1. Claims 1-7, 15-26, 30-32, and new claims 33-34 have been considered. Claims 1,15, 
and 19 have been amended as per Applicants' request. New claims 33-34 have been added as 
per Applicants' request. 

Papers Submitted 

2. It is hereby acknowledged that the following papers have been received and placed of 
record in the file: RCE as received 21 April 2006; Extension of Time for Three Months we 
received 21 April 2006; Amendment as received 27 April 2006; and Amendment as received 21 
July 2006. 

Claim Rejections - 35 USC §103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

4. Claims 1-5 and 15-22 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Parady, U.S. Patent Number 5,933,627 (herein referred to as Parady) in view of Dreibelbis et al., 
U.S. Patent Number 5,875,470 (herein referred to as Dreibelbis). 

5. Regarding claims 1 and 15, taking claim 1 as exemplary, Parady has taught an execution 
unit for execution of multiple context threads comprises: 

a. An arithmetic logic unit to process data for executing threads (Parady Col. 2 lines 
18-29 and Col.3 lines 19-25), 
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b. Control logic to control the operation of the arithmetic logic unit (Parady Col.3 
lines 10-18), 

c. A general purpose register set to store and obtain operands for the arithmetic logic 
unit, the register set comprising a plurality of two-ported random access memory 
devices assembled into banks (Parady 48 of Fig.l/Fig.3), each bank being capable 
of performing a read and a write to two different words with two ports in the same 
processor cycle (Parady Fig.3 and Col.3 lines 43-49). Here, because the register 
file contains ten ports (Parady 48 of Fig. 1) and four banks (Parady Col.3 lines 43- 
49), there are inherently at least two ports per bank, therefore allowing each bank 
to write or read at least one word per bank per cycle. 

6. Parady has not explicitly taught the register set comprising two effective read ports and 
one effective write port, wherein the arithmetic logic unit can write to each bank in the general 
purpose register set using the one effective write port. However, Parady has taught that register 
files often have multiple ports (Parady column 5, lines 30-31). Dreibelbis has taught the register 
set comprising two effective read ports and one effective write port, wherein the arithmetic logic 
unit can write to each bank in the general purpose register set using the one effective write port 
(Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 1 1-19 and 50-62; column 4, lines 18- 
31; column 4, line 46 to column 5, line 24; column 5, lines 39-55; Figure 1A; Figure IB). 
Dreibelbis has taught that their banking system is usable in a system with several processes 
(Dreibelbis column 8, lines 16-40), such as the multiple threaded system in Parady. A person of 
ordinary skill in the art at the time the invention was made, and as taught by Dreibelbis, would 
have recognized that the memory bank system of Dreibelbis provides extraordinarily high 
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parallelism and significantly improves slower processor access to a shared cache (Dreibelbis 
column 3, lines 11-19 and 50-55). Therefore, it would have been to a person of ordinary skill in 
the art at the time the invention was made to incorporate the memory banking of Dreibelbis in 
the device of Parady to increase parallelism and improve cache access speed. 

7. Claim 15 is nearly identical to claim 1, differing in its parent claim, but encompassing the 
same scope. Therefore, claim 15 is rejected for the same reasons as claim 1 . 

8. Regarding claims 2 and 16, taking claim 2 as exemplary, Parady in view of Dreibelbis 
has taught the execution unit of claim 1 , wherein the register set is logically partitioned into a 
plurality of relatively addressable windows (Parady Col.3 lines 43-49 and Col.4 lines 1-8). Here, 
the register file is divided into four register files for four threads (Parady Col.3 lines 43-45), and 
there is a thread field in each instruction that identifies which thread an instructions operands 
come from (Parady Col.4 lines 1-8). This makes each register in each register file relatively 
addressable, being differentiated from each other relative to their 2-bit thread field. 

9. Claim 16 is nearly identical to claim 2, differing in its parent claim, but encompassing the 
same scope. Therefore, claim 16 is rejected for the same reasons as claim 2. 

10. Regarding claims 3 and 17, taking claim 3 as exemplary, Parady in view of Dreibelbis 
has taught the execution unit of claim 2, wherein the number of windows of the register set is 
related to the number of threads that can execute in the processor (Parady Col.3 lines 43-49). 

1 1 . Claim 1 7 is nearly identical to claim 3, differing in its parent claim, but encompassing the 
same scope. Therefore, claim 17 is rejected for the same reasons as claim 3. 

12. Regarding claims 4, 18 and 21, taking claim 4 as exemplary, Parady in view of Dreibelbis 
has taught the execution unit of claim 1 , wherein relative addressing allows an executing thread 
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to access the register set relative to the starting point of a window (Parady Col. 3 lines 43-49 and 
Col.4 lines 1-8). Here, the register file is divided into four register files for four threads (Parady 
Col. 3 lines 43-45), and there is a thread field in each instruction that identifies which thread an 
instructions operands come from (Parady Col.4 lines 1-8). This makes each register in each 
register file relatively addressable, being differentiated from each other relative to their 2-bit 
thread field, allowing a thread to access registers associated with its 2-bti thread field. 

13. Claims 18 and 21 are nearly identical to claim 4, differing in their parent claims, but 
encompassing the same scope. Therefore, claim 18 and 21 are rejected for the same reasons as 
claim 4. 

14. Regarding claims 5 and 22, taking claim 5 as exemplary, Parady in view of Dreibelbis 
has taught the execution unit of claim 1 , wherein the register set is absolutely addressable, where 
the register set may be accessed for an executing thread by providing and exact address (Parady 
Col.3 lines 43-49 and Col.4 lines 1-8, 18-22). As shown above in paragraphs 23 and 27, the 
register set is relatively addressable using a 2-bit thread field that specifies which thread, and 
consequently which register window, an instruction's operands come from. However, the 2-bit 
thread field can also be used to inter-relate two threads (Parady Col.4 lines 18-22), thus allowing 
one thread to access to any other register in any other thread, providing absolute addressability. 

15. Claim 22 is nearly identical to claim 5, differing in its parent claim, but encompassing the 
same scope. Therefore, claim 22 is rejected for the same reasons as claim 5. 

16. Regarding claim 19, Parady has taught a processor unit comprising: 

a. An execution unit for execution of multiple context threads, the execution unit 
comprising: 
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L An arithmetic logic unit to process data for executing threads (Parady 
Col.2 lines 18-29 and Col.3 lines 19-25), 

ii. Control logic to control the operation of the arithmetic logic unit (Parady 
Col.3 lines 10-18); 

iii. A general purpose register set (Parady 48 of Fig. l/Fig.3) to store and 
obtain operands for the arithmetic logic unit (Parady see Fig. 3), the 
register set comprising a plurality of two-ported random access memory 
devices. While not taught explicitly, it is inherent in the operation of a 
register file that it has at least one port to read and one port to write data in 
and out of the register file, and thus inherently a register file has at least 
two ports. 

17, Parady has not explicitly taught the register set comprising two effective read ports and 
one effective write port; and a data link between the arithmetic logic unit and the one effective 
write port of the general purpose register set, wherein the data link allows the arithmetic logic 
unit to write to different two-ported random access memory devices in the general purpose 
register set through the one effective write port. However, Parady has taught that register files 
often have multiple ports (Parady column 5, lines 30-31). Dreibelbis has taught the register set 
comprising two effective read ports and one effective write port; and a data link between the 
arithmetic logic unit and the one effective write port of the general purpose register set, wherein 
the data link allows the arithmetic logic unit to write to different two-ported random access 
memory devices in the general purpose register set through the one effective write port 
(Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 1 1-19 and 50-62; column 4, lines 18- 
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31; column 4, line 46 to column 5, line 24; column 5, lines 39-55; Figure 1A; Figure IB). 
Dreibelbis has taught that their banking system is usable in a system with several processes 
(Dreibelbis column 8, lines 16-40), such as the multiple threaded system in Parady. A person of 
ordinary skill in the art at the time the invention was made, and as taught by Dreibelbis, would 
have recognized that the memory bank system of Dreibelbis provides extraordinarily high 
parallelism and significantly improves slower processor access to a shared cache (Dreibelbis 
column 3, lines 11-19 and 50-55). Therefore, it would have been to a person of ordinary skill in 
the art at the time the invention was made to incorporate the memory banking of Dreibelbis in 
the device of Parady to increase parallelism and improve cache access speed. 

1 8. Regarding claim 20, Parady in view of Dreibelbis has taught the processor of claim 1 9, 
wherein the register set is logically partitioned into a plurality of relatively addressable windows, 
where the number of windows of the register set is related to the number of threads that can 
execute in the processor (Parady Col.3 lines 43-49 and Col.4 lines 1-8). Here, the register file is 
divided into four register files for four threads (Parady Col.3 lines 43-45), and there is a thread 
field in each instruction that identifies which thread an instructions operands come from (Parady 
Col.4 lines 1-8). This makes each register in each register file relatively addressable, being 
differentiated from each other relative to their 2-bit thread field. 

19. Regarding to claims 30-32, Parady in view of Dreibelbis has taught 

a. Wherein the register set comprises a first number n of two-ported random access 
memory devices, a second number r of effective read ports, and a third number w 
of effective write ports, where n > 2, 2 < r < n, and 2 < w < n-1 (Applicant's claim 
30) (Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 1 1-19 and 50-62; 
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column 4, lines 18-31; column 4, line 46 to column 5, line 24; column 5, lines 39- 
55; Figure 1A; Figure IB); 

b. Wherein storing and obtaining operands comprises storing and obtaining operands 
within the general purpose register comprising a first number n of two-ported 
random access memory devices, a second number r of effective read ports, and a 
third number w of effective write ports, where n > 2, 2 < r < n, and 2 < w < n-1 
(Applicant's claim 31) (Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 
11-19 and 50-62; column 4, lines 18-31; column 4, line 46 to column 5, line 24; 
column 5, lines 39-55; Figure 1 A; Figure IB); and 

c. Wherein the general purpose register set comprises a first number n of two-ported 
random access memory devices, a second number r of effective read ports, and a 
third number w of effective write ports, where n > 2, 2 < r < n, and 2 < w < n-1 
(Applicant's claim 32) (Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 
11-19 and 50-62; column 4, lines 18-31; column 4, line 46 to column 5, line 24; 
column 5, lines 39-55; Figure 1A; Figure IB). 

20. Referring to claims 33-34, Parady in view of Dreibelbis has taught 

a. Wherein memory addresses of the banks are interleaved (Applicants' claim 33) 
(Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 11-19 and 50-62; 
column 4, lines 18-31; column 4, line 46 to column 5, line 24; column 5, lines 39- 
55; Figure 1A; Figure IB); and 

b. Wherein memory addresses of the banks are interleaved (Applicants' claim 34) 
(Dreibelbis Abstract; column 2, lines 6-29; column 3, lines 11-19 and 50-62; 
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column 4, lines 18-31; column 4, line 46 to column 5, line 24; column 5, lines 39- 
55; Figure 1A; Figure IB). 

21 . Claims 6-7 and 23-25 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Parady, U.S. Patent Number 5,933,627 (herein referred to as Parady) in view of Dreibelbis et al., 
U.S. Patent Number 5,875,470 (herein referred to as Dreibelbis), as applied to claims 1-5 above, 
and further in view of Waldspurger et al., "Register Relocation: Flexible Contexts for 
Multithreading" (herein referred to as Waldspurger). 

22. Regarding claims 6 and 23, taking claim 6 as exemplary, Parady has taught the execution 
unit of claim 1, wherein the control logic further comprises: 

a. Context switching logic (Parady 1 12 of Fig.3) fed by signals from a plurality of 
shared resources (Parady Col.3 lines 57-65). 

23. Parady has not explicitly taught wherein the signals cause the context event logic to 
indicate that threads are either available or unavailable for execution. 

24. However, Waldspurger has taught a context switch scheduler that comprises a circularly- 
linked "ready queue" which determines which contexts are ready for execution when a context 
switch is required in order to provide fast context switching (Waldspurger paragraph 1 of Sec. 
2.2). One of ordinary skill in the art would have recognized that it is a primary goal of 
microprocessor designers to reduce delays in their datapath, such as those introduced when a 
context switch is required, thereby increasing the speed and throughput of their processors. 
Therefore, one of ordinary skill in the art would have found it obvious to modify the processor of 
Parady to provide threads that are available for execution in the manner of Waldspurger so that 
context switches can be performed fast, thus increasing the processor speed. 
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25. Regarding claims 7 and 23, taking claim 7 as exemplary, Parady in view of Waldspurger 
has taught the execution unit of claim 6, wherein the control logic addresses a set of memory 
locations for storing a list of available threads that correspond to threads that are ready to be 
executed and a set of memory locations for storing a list of unavailable threads that are not ready 
to be executed (see above paragraph 27 and Waldspurger paragraph 1 of Sec. 2.2). Here, the "set 
of memory locations'' is a circularly-linked queue, such that the next threads that are ready to be 
executed are at the "head" of the list, while those that are not ready, or were recently switched 
from, reside at the "tail" of the list (Waldspurger Sec. 2.2). 

26. Claim 23 is nearly identical to claims 6 and 7, differing in its parent claim, but 
encompassing the same scope as claims 6 and 7. Therefore, claim 23 is rejected for the same 
reasons as claims 6 and 7. 

27. Regarding claim 24, Parady in view of Waldspurger has taught the execution unit of 
claim 23, wherein execution of a context swap instruction causes a currently running thread to be 
swapped out to the unavailable thread memory set and a thread from the available thread 
memory set to begin execution within a single execution cycle (Parady Fig.3, Col.2 lines 18-25, 
Col.3 lines 57-65 and Waldspurger paragraphs 2-5 of Sec. 2.2.). Here, a load or store operation 
signals a context switch (Parady Fig.3 and Col.3 lines 57-65), and the context switch steps store 
the current context at the "tail" of the circularly-linked list and update the current context to be 
the thread that was next in line to be executed (Waldspurger paragraphs 2-5 of Sec. 2.2). 

28. Regarding claim 25, Parady in view of Waldspurger has taught the execution unit of 
claim 23, wherein execution of the context swap instruction specifies one of the signal inputs and 
upon receipt of the specified signal input causes the swapped out thread to be stored in the 
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available thread memory set (Parady Fig.3, CoL2 lines 18-25, CoL3 lines 57-65 and Waldspurger 
paragraphs 2-5 of Sec. 2.2.). Here, a load or store operation signals a context switch (Parady 1 14 
of Fig.3 and Col.3 lines 57-65), and the context switch steps store the current context at the "tail" 
of the circularly-linked list and update the current context to be the thread that was next in line to 
be executed (Waldspurger paragraphs 2-5 of Sec. 2.2). 

29. Claim 26 is rejected under 35 U.S.C. 103(a) as being unpatentable over Parady in view of 
Dreibelbis in view of Waldspurger, as applied to claim 23 above, and further in view of Trauben 
et al., U.S. Patent No. 5,509,130. 

30. Regarding claim 26, Parady in view of Waldspurger has taught the execution unit of 
claim 23, but have not explicitly taught wherein execution of the context swap instruction 
specifies a defer one operation which causes execution of one more instruction and then causes 
the current context to be swapped out. 

3 1 . However, Trauben has taught a branch delay instruction which causes the execution of 
one instruction before changing context in order to hide the latency of computing and fetching 
the branch target (Trauben Col. 14 lines 41-60). One of ordinary skill in the art would have 
recognized that it is desirable to reduce the amount of delay in a microprocessor and thus allow 
faster execution times. Therefore, one of ordinary skill in the art would have found it obvious to 
modify the processor of Parady in view of Waldspurger to include a branch delay instruction 
which allows an instruction to execute while computing and fetching a branch target so that the 
latency of the operation can be avoided. 

Response to Arguments 
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32. Applicant's arguments with respect to claims 1-7, 15-26, and 30-34 have been considered 
but are moot in view of the new ground(s) of rejection. 



33. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Aimee J. Li whose telephone number is (571) 272-4169. The 
examiner can normally be reached on M-T 7:30am~5 :00pm. 

34. If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Eddie Chan can be reached on (571) 272-4162. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

35. Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 



Conclusion 



AJL 

Aimee J. Li 
26 July 2006 




